Skip to content

DOC: make RST files conform to pandas token usage #37393

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

LeviMatus
Copy link
Contributor

@LeviMatus LeviMatus commented Oct 25, 2020

See also: #32316

In #36845 the RST files were updated to normalize usage of the word "pandas". The following tokens were replaced:

  • `pandas`
  • ``pandas``
  • *pandas*
  • **pandas**
  • Pandas

In that PR, I wrongly said that I caught all patterns. I was wrong, and this PR address that (my apologies).

Misc Note

I'm also working on a rule for code_checks.sh to catch this in the future. I have the logic of the rule, but I'm struggling to get it to output nicely in such a way that it conforms with the rest of the rules in code_checks.sh. I want to leave this information here in case someone has any ideas I can bounce of off, or in case anyone else wants to try their hand at this. I could be spinning my wheels there for awhile and don't want it to hold up getting these changes in.

I used the following rule to find the remaining cases:

    MSG='Check doc/source RST files which format pandas incorrectly'; echo $MSG
    invgrep -R --include=*.rst -Poz '((\*|`)+)pandas\2|((((\.\. code-block:: ipython\s+)(\s +[^\n]*)+)|.\.\ [^\n]+https?[^\n]+|(`+).*\s+<https?[^\n]+>\8(__)?|`.*`_+\s+~+|.*:ref:`.*`)(*SKIP)(?!)|\bPandas\b(?!-))' doc/source/ | sed 's/\x0/\n/g'
    RET=$(($RET + $?)) ; echo $MSG "DONE"

This finds easily identifiable formatting errors (``pandas``, for example), but it also respects article titles and code examples in an RST code-blocks. It uses grep -v to get all text on a single line, but that causes error messages to lose the line number of the file that the error occurred on (everything in file foo.rst is reported to have happened on line 1). Because of this I'm not including the regex in this PR. If anyone is interested in giving me pointers on getting that bit fixed, or even taking my idea and running with it themselves, please do so.

Edit:

Because its desired to retain ``pandas`` in some places where it contextually make sense, regex won't be able to capture such uses. The above regex can instead be:

    MSG='Check doc/source RST files which format pandas incorrectly'; echo $MSG
    invgrep -R --include=*.rst -Poz '(\*+|`)pandas\1|((((\.\. code-block:: ipython\s+)(\s +[^\n]*)+)|.\.\ [^\n]+https?[^\n]+|(`+).*\s+<https?[^\n]+>\7(__)?|`.*`_+\s+~+|.*:ref:`.*`)(*SKIP)(?!)|\bPandas\b(?!-))' doc/source/ | sed 's/\x0/\n/g'
    RET=$(($RET + $?)) ; echo $MSG "DONE"

which will ignore ``pandas`` but catch `pandas`

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's standardize but a) need a precommit / code-check rules for this, and it should be pandas (double-backticks)

@@ -616,7 +616,7 @@ be added with blank lines before and after them.

The way to present examples is as follows:

1. Import required libraries (except ``numpy`` and ``pandas``)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leave this, actually generally we do want to use double-backticks around pandas as it highlites it

@jreback jreback added the Docs label Oct 26, 2020
@jorisvandenbossche
Copy link
Member

and it should be pandas (double-backticks)

That's not what we have been doing in a bunch of previous PRs, though.

But, I would say we shouldn't convert all occurrences of "pandas" to "pandas". IMO, it if it clearly about the package (and not the project in general), it is fine to keep using pandas in some places, instead of having a hard rule.

@LeviMatus
Copy link
Contributor Author

But, I would say we shouldn't convert all occurrences of "pandas" to "pandas". IMO, it if it clearly about the package (and not the project in general), it is fine to keep using pandas in some places, instead of having a hard rule.

Noted. I'll update my PR to change this.

@LeviMatus LeviMatus requested a review from jreback October 30, 2020 17:02
@LeviMatus
Copy link
Contributor Author

I added ``pandas`` back into the docs where I think it makes sense based on context. Its kind of subjective, so I'd appreciate feedback on 5d9b334 and the remaining diffs in the PR.

@jreback
Copy link
Contributor

jreback commented Nov 26, 2020

can you merge master and ping on green

@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Dec 27, 2020
@MarcoGorelli MarcoGorelli self-requested a review December 27, 2020 12:41
@jreback
Copy link
Contributor

jreback commented Dec 29, 2020

closing as stale if you want to continue, pls open a new PR.

@jreback jreback closed this Dec 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DOC: Standardize references to pandas in the documentation
3 participants